Search CORE

24 research outputs found

An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++

Author: A Vlahou
Alan R. Dabney
Anthony P. Leclerc
AR Dabney
B Efron
B Rosner
B Wu
BL Adam
C Strobl
C Strobl
D Agranoff
DS Palmer
EF Petricoin
EJ Finehout
Elizabeth G. Hill
ET Fung
Fabio Rapallo
G Izmirlian
GA Churchill
H Zhang
JM Koomen
Jonas S. Almeida
JR Quinlan
JS Morris
L Breiman
L Breiman
L Breiman
L Li
LE Breiman
M Hilario
MR Segal
PJ Adam
RW Garden
S Schaub
SK Lee
TM Pawlik
TP Conrads
V Svetnik
Y Yasui
YD Chen
Yuliya V. Karpievitch
YV Karpievitch
YV Karpievitch
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Texas A&M Repository

Higher education delays and shortens cognitive impairment. A multistate life table analysis of the US Health and Retirement Study

Author: A Hofman
A Ruitenberg
AA Mamun
AC Thiebaut
AF Jorm
AL Fitzpatrick
AR Herzog
AS Karlamangla
BJ Gurland
C Brayne
C Qiu
C Qiu
CP Ferri
D Gustafson
DF Sullivan
DL Bachman
EC Gorospe
Frans J. Willekens
G Izmirlian
G Maskarinec
GJ Biessels
H Kramer
J Barendregt
J Brandt
JJ Barendregt
JJ Llibre Rodriguez
JK Ujcic-Voortman
K Langa
K Suthers
KJ Anstey
KM Langa
KM Mehta
L Fratiglioni
L Zhu
Luc Bonneux
M Kivipelto
M Oijen van
M Reuser
MF Folstein
Mieke Reuser
MJ Stampfer
MM Glymour
MX Tang
RG Rogers
S Natarajan
T Anttila
TP Ng
Y Stern
Y Stern
Publication venue: Springer Netherlands
Publication date: 01/01/2011
Field of study

Improved health may extend or shorten the duration of cognitive impairment by postponing incidence or death. We assess the duration of cognitive impairment in the US Health and Retirement Study (1992–2004) by self reported BMI, smoking and levels of education in men and women and three ethnic groups. We define multistate life tables by the transition rates to cognitive impairment, recovery and death and estimate Cox proportional hazard ratios for the studied determinants. 95% confidence intervals are obtained by bootstrapping. 55 year old white men and women expect to live 25.4 and 30.0 years, of which 1.7 [95% confidence intervals 1.5; 1.9] years and 2.7 [2.4; 2.9] years with cognitive impairment. Both black men and women live 3.7 [2.9; 4.5] years longer with cognitive impairment than whites, Hispanic men and women 3.2 [1.9; 4.6] and 5.8 [4.2; 7.5] years. BMI makes no difference. Smoking decreases the duration of cognitive impairment with 0.8 [0.4; 1.3] years by high mortality. Highly educated men and women live longer, but 1.6 years [1.1; 2.2] and 1.9 years [1.6; 2.6] shorter with cognitive impairment than lowly educated men and women. The effect of education is more pronounced among ethnic minorities. Higher life expectancy goes together with a longer period of cognitive impairment, but not for higher levels of education: that extends life in good cognitive health but shortens the period of cognitive impairment. The increased duration of cognitive impairment in minority ethnic groups needs further study, also in Europe

Crossref

Proceedings - University of Groningen

University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

KNAW Repository

Dissertations of the University of Groningen

Small-Sample Error Estimation for Bagged Classification Rules

Author: A Assareh
A Bhattacharjee
A Statnikov
B Efron
B Efron
B Efron
B Wu
B Zhang
B-L Adam
EC Gunther
G Izmirlian
G Martínez-Muñoz
HJ Issaq
L Breiman
L Breiman
L Xu
LJ Van't Veer
MJ van de Vijver
P Geurts
R Díaz-Uriarte
RE Banfield
RE Schapire
RO Duda
S Alvarez
T Bylander
TT Vu
U Braga-Neto
U Braga-Neto
U Braga-Neto
UM Braga-Neto
W Tong
Y Freund
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Texas A&M Repository

A survey of computational tools for downstream analysis of proteomic and other omic datasets

Crossref

Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles

Author: A Vlahou
AB Lowenfels
B Wu
BA Lashner
BL Adam
C Belluco
C Rosty
CJ Scarlett
CN White
D Brady
D Cecconi
D Li
DH Wolpert
DK Ornstein
EF Petricoin
EF Petricoin
EM Posadas
EP Diamandis
ET Fung
G Alexe
G Bhanot
G Izmirlian
G William Wong
G Zhang
GI Webb
Guangtao Ge
H Neubauer
H Wang
I Guyon
I Levner
IH Witten
J Friedman
J Yu
JD Wulfkuhle
JR Quinlan
JR Quinlan
JS Yu
K Mikuriya
K Ning
KA Baggerly
KA Baggerly
KM Ting
KR Coombes
L Andrade
L Breiman
L Breiman
L Li
L Todorovski
L Zhou
M Gronborg
M Jafari
M Roesch-Ely
M Wagner
P Alfonso
P Geurts
R Chen
R Marcuson
RE Schapire
SA Schwartz
SQ Wang
SR Hingorani
T Crnogorac-Jurcevic
TM Mitchell
TP Conrads
W Wang
Y Qu
YD Cai
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Detecting a difference – assessing generalisability when modelling metabolome fingerprint data in longer term studies of genetically modified plants

Author: A. Buchholz
B. Efron
B. Wu
C.E. Thomaz
D.A. Fell
D.A. Fell
D.A. Fell
D.B. Kell
D.I. Broadhurst
D.P. Enot
D.P. Enot
David P. Enot
E.J. Kok
E.M. Hellwege
E.M. Hellwege
G. Izmirlian
G.G. Harrigan
G.S. Catchpole
H.A. Kuiper
H.A. Kuiper
H.E. Johnson
J. Allen
J. Kopka
J. Kopka
J. Kopka
J.H. Zar
John Draper
L. Breiman
L. Breiman
L.V. Shepherd
L.W. Sumner
M. Defernez
M.E. Hansen
M.R. Viant
Manfred Beckmann
N. Schauer
N.V. Reo
O. Fiehn
O. Fiehn
O. Langsrud
P. Jonsson
R. Goodacre
R. Goodacre
R. Goodacre
R.J. Bino
R.L. Somorjai
R.M. Jarvis
S. Singh
S.O. Hagan
T. Sing
U. Roessner
U. Roessner
W.B. Dunn
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Resolving paradoxes involving surrogate end points

Author: Grant Izmirlian
Stuart G. Baker
Victor Kipnis
Publication venue
Publication date
Field of study

We define a surrogate end point as a measure or indicator of a biological process that is obtained sooner, at less cost or less invasively than a true end point of health outcome and is used to make conclusions about the effect of an intervention on the true end point. Prentice presented criteria for valid hypothesis testing of a surrogate end point that replaces a true end point. For using the surrogate end point to estimate the predicted effect of intervention on the true end point, Day and Duffy assumed the Prentice criterion and arrived at two paradoxical results: the estimated predicted intervention effect by using a surrogate can give more precise estimates than the usual estimate of the intervention effect by using the true end point and the variance is greatest when the surrogate end point perfectly predicts the true end point. Begg and Leung formulated similar paradoxes and concluded that they indicate a flawed conceptual strategy arising from the Prentice criterion. We resolve the paradoxes as follows. Day and Duffy compared a surrogate-based estimate of the effect of intervention on the true end point with an estimate of the effect of intervention on the true end point that uses the true end point. Their paradox arose because the former estimate assumes the Prentice criterion whereas the latter does not. If both or neither of these estimates assume the Prentice criterion, there is no paradox. The paradoxes of Begg and Leung, although similar to those of Day and Duffy, arise from ignoring the variability of the parameter estimates irrespective of the Prentice criterion and disappear when the variability is included. Our resolution of the paradoxes provides a firm foundation for future meta-analytic extensions of the approach of Day and Duffy. Copyright 2005 Royal Statistical Society.

Research Papers in Economics

Sex differences in the prevalence of mobility disability in old age: The dynamics of incidence, recovery, and mortality

Author: Guralnik Jack M.
Izmirlian Grant
Leveille Suzanne G.
Melzer David
Penninx Brenda W.J.H.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2000
Field of study

Objectives. This study examined sex differences in the prevalence of mobility disability in older adults according to the influences of three components of prevalence: disability incidence, recovery from disability, and mortality. Methods. Participants in a population-based study of older adults from three communities in the United States (N= 10,263) were studied for up to 7 years. Life table methods were used to estimate the influence of each of the three components of disability prevalence in women and men. Sex differences in probabilities for transition states were measured by relative risks derived from a single model using a Markov chain approach. Results. The proportion of disabled women increased from 22% of women aged 70 years to 81% of those aged 90 years. In men, comparable figures were 15% and 57%. Incidence had the greatest impact on the sex differences in disability prevalence until age 90 and older when recovery rates had a greater impact on differences in prevalence. Mortality differences in men and women had only a modest impact on sex differences in disability prevalence. These findings initially seemed to contradict striking sex differences observed in the relative risks for mortality in men compared with women. Subsequent graphical analyses showed that incidence rather than recovery or mortality largely accounted for sex differences in disability prevalence in old age. Conclusion. Disability incidence, recovery from disability, and mortality dynamically influence the sex differences in the prevalence of mobility disability. However, incidence has the greatest impact overall on the higher prevalence of disability in women compared with men

Prostate-Specific Antigen Screening Trials and Prostate Cancer Deaths: The Androgen Deprivation Connection

Author: Attems
Banach-Petrosky
Craft
Das
Denham
G. L. Gabor Miklos
Hugosson
I. E. Haines
Izmirlian
Janoff
Lu-Yao
Moyer
Nanda
Newschaffer
Schroder
Schroder
Shahinian
Theoret
Vasudeva
Wilt
Wong
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref